AITopics | similar image

Collaborating Authors

similar image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Image Categorization and Search via a GAT Autoencoder and Representative Models

Sap, Duygu, Lotz, Martin, Mattinson, Connor

arXiv.org Artificial IntelligenceOct-21-2025

We propose a method for image categorization and retrieval that leverages graphs and a graph attention network (GAT)-based autoencoder. Our approach is representative-centric, that is, we execute the categorization and retrieval process via the representative models we construct for the images and image categories. We utilize a graph where nodes represent images (or their representatives) and edges capture similarity relationships. GAT highlights important features and relationships between images, enabling the autoencoder to construct context-aware latent representations that capture the key features of each image relative to its neighbors. We obtain category representatives from these embeddings and categorize a query image by comparing its representative to the category representatives. We then retrieve the most similar image to the query image within its identified category. We demonstrate the effectiveness of our representative-centric approach through experiments with both the GAT autoencoders and standard feature-based techniques.

artificial intelligence, category, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.16514

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval

Lourenço, Vítor N., Silva, Gabriela G., Fernandes, Leandro A. F.

arXiv.org Artificial IntelligenceJul-30-2025

From the background, the procedure extracts the holes' shapes and associate them with the component shapes' list (lines 7 and 8). The foreground shapes are used in the next iterations (lines 5 and 9) until all component shapes have been extracted from the initial binary trademark image. Shape's feature extraction consists of building a feature vector for each component shape of a given trademark image (Figs. 1 (d) and (k)). These 29-dimension feature vectors combine region-based and contour-based descriptors. Shape's region is described by the 25 moments of the Zernike polynomials (ZM) of order p from 0 to 8: Z p,q= p + 1 π null ρ null θ V p,q(ρ,θ) I ( ρ,θ), (1) where ρ = null x 2 + y 2 is the length of vector from origin to pixel (x,y), θ is the angle between the vector defining ρ and the x -axis in the counter clockwise direction and V p,q(ρ,θ) is a Zernike polynomial of order p with repetition q that forms a complete set over the interior of the unit disk inscribing the component shape: V p,q( ρ,θ) = R p,q(ρ) exp ( i qθ) .

machine learning, pattern recognition, trademark image, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.5753/sibgrapi.2019.9803

1908.02786

Country: South America > Brazil > Rio de Janeiro (0.28)

Genre: Research Report (1.00)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.93)

Add feedback

Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation

Chu, Yun-Wei, Zhang, Kai, Malon, Christopher, Min, Martin Renqiang

arXiv.org Artificial IntelligenceFeb-20-2025

Multimodal Large Language Models (MLLMs) have shown impressive performance in vision and text tasks. However, hallucination remains a major challenge, especially in fields like healthcare where details are critical. In this work, we show how MLLMs may be enhanced to support Visual RAG (V-RAG), a retrieval-augmented generation framework that incorporates both text and visual data from retrieved images. On the MIMIC-CXR chest X-ray report generation and Multicare medical image caption generation datasets, we show that Visual RAG improves the accuracy of entity probing, which asks whether a medical entities is grounded by an image. We show that the improvements extend both to frequent and rare entities, the latter of which may have less positive training data. Downstream, we apply V-RAG with entity probing to correct hallucinations and generate more clinically accurate X-ray reports, obtaining a higher RadGraph-F1 score.

dataset, language model, med-mllm, (14 more...)

arXiv.org Artificial Intelligence

2502.1504

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?

Mayilvahanan, Prasanna, Wiedemer, Thaddäus, Rusak, Evgenia, Bethge, Matthias, Brendel, Wieland

arXiv.org Artificial IntelligenceOct-14-2023

Foundation models like CLIP are trained on hundreds of millions of samples and effortlessly generalize to new tasks and inputs. Out of the box, CLIP shows stellar zero-shot and few-shot capabilities on a wide range of out-of-distribution (OOD) benchmarks, which prior works attribute mainly to today's large and comprehensive training dataset (like LAION). However, it is questionable how meaningful terms like out-of-distribution generalization are for CLIP as it seems likely that web-scale datasets like LAION simply contain many samples that are similar to common OOD benchmarks originally designed for ImageNet. To test this hypothesis, we retrain CLIP on pruned LAION splits that replicate ImageNet's train-test similarity with respect to common OOD benchmarks. While we observe a performance drop on some benchmarks, surprisingly, CLIP's overall performance remains high. This shows that high train-test similarity is insufficient to explain CLIP's OOD performance, and other properties of the training data must drive CLIP to learn more generalizable representations. Additionally, by pruning data points that are dissimilar to the OOD benchmarks, we uncover a 100M split of LAION ($\frac{1}{4}$th of its original size) on which CLIP can be trained to match its original OOD performance.

dataset, imagenet-train, similarity, (14 more...)

arXiv.org Artificial Intelligence

2310.09562

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a Novel Metric

Borgersen, Karl Audun Kagnes, Goodwin, Morten, Sharma, Jivitesh, Aasmoe, Tobias, Leonhardsen, Mari, Rørvik, Gro Herredsvela

arXiv.org Artificial IntelligenceAug-30-2023

Detecting visually similar images is a particularly useful attribute to look to when calculating product recommendations. Embedding similarity, which utilizes pre-trained computer vision models to extract high-level image features, has demonstrated remarkable efficacy in identifying images with similar compositions. However, there is a lack of methods for evaluating the embeddings generated by these models, as conventional loss and performance metrics do not adequately capture their performance in image similarity search tasks. In this paper, we evaluate the viability of the image embeddings from numerous pre-trained computer vision models using a novel approach named CorrEmbed. Our approach computes the correlation between distances in image embeddings and distances in human-generated tag vectors. We extensively evaluate numerous pre-trained Torchvision models using this metric, revealing an intuitive relationship of linear scaling between ImageNet1k accuracy scores and tag-correlation scores. Importantly, our method also identifies deviations from this pattern, providing insights into how different models capture high-level image features. By offering a robust performance evaluation of these pre-trained models, CorrEmbed serves as a valuable tool for researchers and practitioners seeking to develop effective, data-driven approaches to similar item recommendations in fashion retail.

category, corrembed, dataset, (16 more...)

arXiv.org Artificial Intelligence

2308.16126

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.06)
North America > United States > New York > New York County > New York City (0.05)
(4 more...)

Genre:

Overview (0.66)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SUVR: A Search-based Approach to Unsupervised Visual Representation Learning

Xu, Yi-Zhan, Chen, Chih-Yao, Li, Cheng-Te

arXiv.org Artificial IntelligenceMay-24-2023

Unsupervised learning has grown in popularity because of the difficulty of collecting annotated data and the development of modern frameworks that allow us to learn from unlabeled data. Existing studies, however, either disregard variations at different levels of similarity or only consider negative samples from one batch. We argue that image pairs should have varying degrees of similarity, and the negative samples should be allowed to be drawn from the entire dataset. In this work, we propose Search-based Unsupervised Visual Representation Learning (SUVR) to learn better image representations in an unsupervised manner. We first construct a graph from the image dataset by the similarity between images, and adopt the concept of graph traversal to explore positive samples. In the meantime, we make sure that negative samples can be drawn from the full dataset. Quantitative experiments on five benchmark image classification datasets demonstrate that SUVR can significantly outperform strong competing methods on unsupervised embedding learning. Qualitative experiments also show that SUVR can produce better representations in which similar images are clustered closer together than unrelated images in the latent space.

artificial intelligence, machine learning, neighbor, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10096936

2305.14754

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Colorado (0.04)
North America > United States > California (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain

Rigoll, Philipp, Petersen, Patrick, Stage, Hanno, Ries, Lennart, Sax, Eric

arXiv.org Artificial IntelligenceApr-21-2023

Handling large amounts of data has become a key for developing automated driving systems. Especially for developing highly automated driving functions, working with images has become increasingly challenging due to the sheer size of the required data. Such data has to satisfy different requirements to be usable in machine learning-based approaches. Thus, engineers need to fully understand their large image data sets for the development and test of machine learning algorithms. However, current approaches lack automatability, are not generic and are limited in their expressiveness. Hence, this paper aims to analyze a state-of-the-art text and image embedding neural network and guides through the application in the automotive domain. This approach enables the search for similar images and the search based on a human understandable text-based description. Our experiments show the automatability and generalizability of our proposed method for handling large data sets in the automotive domain.

artificial intelligence, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2304.10247

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(9 more...)

Genre: Research Report (0.41)

Industry:

Transportation > Ground > Road (0.55)
Information Technology > Robotics & Automation (0.55)
Automobiles & Trucks (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)

Add feedback

Image deduplication using OpenAI's CLIP and Community Detection

#artificialintelligenceApr-9-2023, 16:55:42 GMT

A short guide on how to use image embeddings from OpenAI's CLIP and clustering techniques in order to group near-duplicate images together. CLIP is trained by trying to align image text embedding pairs, or "learning visual representations from natural language supervision". You can use it's text or image embeddings to accomplish a lot of different tasks, such as zero-shot image classification! It's embeddings are pretty powerful. For this task, we're going to use the AirBnB Duplicate Image Dataset, available on Kaggle.

Add feedback

Fashion-model pose recommendation and generation using Machine Learning

Kannumuru, Vijitha, P, Santhosh Kannan S, Shankar, Krithiga, Larnyoh, Joy, Mahadevan, Rohith, Raman, Raja CSP

arXiv.org Artificial IntelligenceFeb-19-2023

Fashion-model pose is an important attribute in the fashion industry. Creative directors, modeling production houses, and top photographers always look for professional models able to pose. without the skill to correctly pose, their chances of landing professional modeling employment are regrettably quite little. There are occasions when models and photographers are unsure of the best pose to strike while taking photographs. This research concentrates on suggesting the fashion personnel a series of similar images based on the input image. The image is segmented into different parts and similar images are suggested for the user. This was achieved by calculating the color histogram of the input image and applying the same for all the images in the dataset and comparing the histograms. Synthetic images have become popular to avoid privacy concerns and to overcome the high cost of photoshoots. Hence, this paper also extends the work of generating synthetic images from the recommendation engine using styleGAN to an extent.

artificial intelligence, machine learning, synthetic image, (12 more...)

arXiv.org Artificial Intelligence

2303.0866

Country:

North America > United States > Michigan > Macomb County > Warren (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.64)

Industry:

Media > Photography (0.52)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

Verma, Gaurav, Vinay, Vishwa, Rossi, Ryan A., Kumar, Srijan

arXiv.org Artificial IntelligenceNov-4-2022

As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to imperceptible variations on benchmark tasks. In this work, we investigate the robustness of multimodal classifiers to cross-modal dilutions - a plausible variation. We develop a model that, given a multimodal (image + text) input, generates additional dilution text that (a) maintains relevance and topical coherence with the image and existing text, and (b) when added to the original text, leads to misclassification of the multimodal input. Via experiments on Crisis Humanitarianism and Sentiment Detection tasks, we find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model. Metric-based comparisons with several baselines and human evaluations indicate that our dilutions show higher relevance and topical coherence, while simultaneously being more effective at demonstrating the brittleness of the multimodal classifiers. Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations, especially in human-facing societal applications. The code and other resources are available at https://claws-lab.github.io/multimodal-robustness/.

dilution, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2211.02646

Country:

North America > United States > California (0.05)
North America > Puerto Rico (0.04)
North America > United States > Colorado (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(2 more...)

Add feedback